Exploring PSI-MI XML Collections Using DescribeX
نویسندگان
چکیده
PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In this study we use DescribeX, a novel visualization technique of (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns, and evolution of different data collections. Our analysis shows DescribeX enhances understanding the instance-level structure of PSI-MI data sources and is a useful tool for standards designers, software developers, and PSI-MI data providers.
منابع مشابه
DescribeX: A Framework for Exploring and Querying XML Web Collections
DescribeX: A Framework for Exploring and Querying XML Web Collections Flavio Rizzolo Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2008 The nature of semistructured data in web collections is evolving. Even when XML web documents are valid with regard to a schema, the actual structure of such documents exhibits significant variations across collections for s...
متن کاملFast Answering of XPath Query Workloads on Web Collections
Several web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath ...
متن کاملRpsiXML: Application Examples
RpsiXML allows the communication between protein interaction data stored in PSI-MI XML format and the statistical and computational environment of R and Bioconductor. In the vignette RpsiXML, we introduced how to read in PSI-MI XML 2.5 files with RpsiXML. In this vignette, we illustrate the use of RpsiXML with example. These applications demonstrate the power of the package in analyzing protein...
متن کاملSummary-based Comparison of Data Quality across Public MAGE-ML Genomic Datasets
In this paper we apply techniques based on DescribeX, a summarybased visualization tool for XML, to analyze data quality in MAGE-ML datasets, extending our previous work by comparing different data sources and data quality evolution.
متن کاملCapturing cooperative interactions with the PSI-MI format
The complex biological processes that control cellular function are mediated by intricate networks of molecular interactions. Accumulating evidence indicates that these interactions are often interdependent, thus acting cooperatively. Cooperative interactions are prevalent in and indispensible for reliable and robust control of cell regulation, as they underlie the conditional decision-making c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Integrative Bioinformatics
دوره 4 شماره
صفحات -
تاریخ انتشار 2007